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(57) Abstract 

A voice recognition system having a feature extraction apparatus (22) is located in a remote station (40). Hie feature extraction 
apparatus (22) eiEtiacts features fiom an iiqnit speech frame and ihen provides die extracted features to a central piooessing station (42). In 
the central processing station (42), the features are provided to a word decoder (48) ^^lich detennuies the syntax of die bpiit ^wech frame. 
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DISTRIBUTED VOICE RECOGNITION SYSTEM 
BACKGROUND OF THE INVENTION 

5 L Field of the Invention 

The present invention relates to speech signal processing. More particularly, 
the present invention relates to a novel method and apparatus for realizing a 
distributed implementation of a standard voice recognition system. 

10 

n. Description of the Related Art 

Voice recognition represents one of the most important techniques to 
endow a machine with simulated intelligence to recognize user or user voiced 

15 commands and to facilitate hvunan interface with the machine. It also 
represents a key technique for human speech understanding. Systems that 
employ techniques to recover a linguistic message from an acoustic speech 
signal are called voice recognizers (VR). A voice recognizer is composed of an 
acoustic processor, which extracts a sequence of information-bearing features 

20 (vectors) necessary for VR from tibie incoming raw speech, and a word decoder, 
which decodes this sequence of features (vectors) to yield the meaningful and 
desired format of output, such as a sequence of linguistic words corresponding 
to the input utterance. To increase the performance of a given system, training 
is required to equip the system with valid parameters. In other words, the 

25 system needs to learn before it can function optimally. 

The acoustic processor represents a front end speech analysis subsystem 
in a voice recognizer. In response to an input speech signal, it provides an 
appropriate representation to characterize the time-varying speech signal. It 
should discard irrelevant information such as background noise, channel 

30 distortion, speaker characteristics and manner of speaking. Efficient acoustic 
feature will furnish voice recognizers with higher acoustic discrimination 
power. The most useful characteristic is the short time spectral envelope. In 
characterizing the short time spectral envelope, tiie two most commonly used 
spectral analysis techniques are linear predictive coding (LPC) and filter-bank 

35 based spectral analysis models. However, it is readily shown (as discussed in 
Rabiner, L.R. and Schafer, R.W., Digital Processing of Speech Signals, Prentice 
Hall, 1978) that LPC not only provides a good approximation to the vocal tract 
spectral envelope, but is considerably less expensive in computation than the 
filter-bank model in all digital implementations. Experience has also 

40 demonstrated that the performance of LPC based voice recognizers is 
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comparable to or better than that of fUter-bank based recognizers (Rabiner, Lit 
and Jtiang, BJi, Fundamentals cf Speech Recognition, Prentice Hall, 1993). 

Refening to Figure 1, in an LPC based acoustic processor, the input 
speech is provided to a microphone (not shown) and converted to an analog 

5 electrical signal. This electrical signal is then digitized by an A/D converter 
(not shown). The digitized speech signals are passed through preemphasis 
jBlter 2 in order to spectrally flatten the signal and to make it less susceptible to 
finite precision effects in subsequent signal processing. The preemphasis 
filtered speech is then provided to segmentation element 4 where it is 

10 segmented or blocked into either temporally overlapped or nonoverlapped 
blocks. The frames of speech data are then provided to windowing element 6 
where framed DC components are removed and a digital windowing operation 
is performed on each frame to lessen the blocking effects due to the 
discontinuity at frame bouiuiaries. A most commonly used window function in 

15 LPC analysis is the Hamming window, w(n) defined as: 

w(n)=0.54-0.46*cos^|^j,0SnSN-l (1) 

The windowed speech is provided to LPC analysis elemmt 8. In LPC analysis 

20 element 8 autocorrelation functions are calculated based on the windowed 
samples and corresponding LPC parameters are obtained directly from 
autocorrelation functions. 

Generally speaking, the word decoder translates the acoixstic feature 
sequence produced by the acoustic processor into an estimate of the speaker's 

25 original word string. This is accomplished in two steps: acoustic pattern 
matching and language modeling. Language modeling can be avoided in the 
applications of isolated word recognition. The LPC parameters from LPC 
analysis element 8 are provided to acoxastic pattern matching element 10 to 
detect and classify possible acoustic patterns, such as phonemes, syllables, 

30 words, etc. The candidate patterns are provided to language modeling 
element 12, which models tiie rules of syntactic constraints that determine what 
sequences of words are grammatically well formed and meaningful. Syntactic 
information can be valuable guide to voice recognition when acoustic 
information alone is ambiguous. Based on language modeling, the VR 

35 sequentially interprets the acoustic feature matching results and provides the 
. estimated word string. 

Both tfie acoustic pattern matching and language modeling in the word 
decoder requires a mathematical model, either deterministic or stochastic, to 
describe the speaker's phonological and acoustic-phonetic variations. The 
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perfonnaiKe of a speech recognition system is directly related to the quality of 
these two modelings. Among the various classes of models for acoustic pattem 
matching, template-based dynamic time warping (DTW) and stochastic hidden 
Markov modeling (HMM) are the two most commonly used. However, it has 

5 been shown that DTW based approach can be viewed as a special case of HMM 
based one, which is a parametric, doubly stochastic model. HMM systems are 
currently the most successful speech recognition algorithms. The doubly 
stochastic property in HMM provides better flexibility in absorbing acoustic as 
well as temporal variations associated with speech signals. This usually results 

10 in improved recognition accuracy. Concerning the language model, a 
stochastic model, called k-gram language model which is detailed in F. Jelink, 
"The Development of an Experimental Discrete Dictatim Recognizer", Proc. IEEE, 
vol. 73, pp. 1616-1624, 1985, has been successfully applied in practical large 
vocabulary voice recognition systems. While in the small vocabulary case, a 

15 deterministic grammar has been formulated as a finite state network (F5N) in 
tite application of airline and reservation and information system (see Rabiner, 
LJL and Levinson, S.Z., A Speaker-Independent, Syntax-Directed, Connected 
Word Recognition System Based on Hidden Markov Model and Level Building, 
IEEE Trans, on lASSP, Vol. 33, No. 3, June 1985). 

20 Statistically, in order to minimize the probability of recognition error, tfie 

voice recognition problem can be formalized as follows: with acoustic evidence 
observation O, the operations of voice recognition are to find the most likely 
word string W* such ihat 

25 W*=argmaxP(WIO) (1) 

where the maximization is over all possible word strings W. In accordaiKe 
with Bayes rule, the posteriori probability P(W I O) in the above equation can be 
rewritten as: 



30 



35 



SiiKe P(0) is irrelevant to recognition, tiie word string estimate can be obtained 
alternatively as: 

W*=argmaxP(W)P(OIW) (3) 

Here P(W) represents the a priori probability that the word string W will be 
uttered, and P(0 1 W) is the probability that ttie acoustic evidence O will be 
40 observed given that the speaker uttered the word sequence W. P(0 1 W) is 
determined by acoustic pattem matching, while tiie a priori probability P(W) is 
defined by language model utilized. 



wo 95/17746 



PCTAJS94n4803 



4 

In connected word recognition, if flie vocabulary is small (less than 100), 
a deterministic grammar can be used to rigidly govern which words can 
logically follow other words to form legal sentences in the language. The 
deterministic grammar can be incorporated in the acoustic matching algorithm 

5 implicitly to constrain the search space of potential words and to reduce the 
computation dramatically. However, when the vocabulary size is either 
medium (greater than 100 but less than 1000) or large (greater than 1000), the 
probability of the word sequence, W=(wi,W2,...,Wn), can be obtained by 
stochastic language modeling. From simple probability theory, the prior 

10 probability, P(W) can be decomposed as 

n 

P(W)=P(wi,W2^.vWn)=nP(^i '^i'^2'-'^i-i) 

where P(wi I wi,W2,-.vWi.i) is the probability that wi will be spoken given that 

15 the word sequence (wi,W2,...,wi.i) precedes it Ihe choice of wi depends on the 
entire past history of input words. For a vocabulary of size V, it requires 
values to specify P(wi I wi,W2,-..,wi-i) completely. Even for the mid vocabulary 
size, this requires a formidable number of samples to train the language model. 
An inaccurate estimate of P(wi I wi,W2,...,wi-i) due to insufficient training data 

20 will depreciate the results of original acoustic matching. 

A practical solution to the above problems is to assume that wi only 
depends on (k-1) preceding words, wi-iwi.2,..,,wi-k+i, A stochastic language 
model can be completely described in terms of P(wi I wi,W2r"/Wi-k+i) from 
which k-gram language model is derived. Since most of the word strings will 

25 never occur in the language if k>3, unigram (k=l), bigram (k=2) and trigram 
(k=3) are the most powerful stochastic language models that take grammar into 
consideration statistically. Language modeling contains both syntactic and 
semantic information which is valuable in recognition, but these probabilities 
must.be trained from a large collection of speech data. When the available 

30 training data are relatively limited, such as K-grams may never occur in the 
data, P(wi | wi.2/Wi.i) can be estimated directiy from bigram probability 
P(vi|wi-ii Details of tiiis process can be fourid in FJelink, TteDCTc/opm^ 
An Experimental Discrete Dictation Recognizer", Proc. IEEE, vol 73, pp. 1616-1624, 
1985. In connected word recognition, whole word model is used as the basic 

35 speech unit, while in continuous voice recognition, subband imits, such as 
phonemes, syllables or demisyllables may be used as the basic speech unit The 
word decoder will be modified accordingly. 

Conventional voice recognition systems integrate acoustic processor and 
word decoders without taking into accoimt ttieir separability, the limitations of 
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application systems (such as power consumption, memory availability, etc.) 
and communication channel characteristics. This motivates the interest in 
devising a distributed voice recognition system with these two components 
appropriately separated. 

5 

SUMMARY OF THE INVENTION 

The present invention is a novel and improved distributed voice 
recognition system, in which (i) the front end acoustic processor can be LPC 

10 based or filter barJc based; (ii) the acoustic pattern matching in the word 
decoder can be based on hidden Markov model (HMM), dynamic time warping 
(DTW) or even neural networks (NN); and (iii) for the connected or continuous 
word recognition purpose, tiie language model can be based on deterministic or 
stochastic grammars. The present invention differs from the usual voice 

15 recognizer in improving system performance by appropriately separating the 
components: feature extraction and word decoding. As demonstrated in next 
examples, if LFC based features, such as cepstrum coefficients, are to be sent 
over communication channel, a transformation between LPC and LSP can be 
used to alleviate ttie noise effects on feature sequence. 

20 

BRIEF DESCRIPTION OF THE DRAWINGS 

\ 

The features, objects, and advantages of the present invention will 
become more apparent from the detailed description set forth below when 
25 taken in conjunction with the drawings in which like reference characters 
identify correspondingly throughout and wherein: 

Figure 1 is a block diagram of a traditiorud speech recognition system; 
Figurie 2 is a block diagram of an exemplary implementation of the 
present invention in a wireless communication environment; 
30 Figure 3 is a general block diagram of the present invention; 

Figiire 4 is a block diagram of an exemplary embodiment of the 
transform element and inverse transform element of the present invention; aiui 
Figure 5 is a block diagram of a preferred embodiment of the present 
invention comprising a local word detector in addition to a remote word 
35 detector; 

DETAILED DESCRIPTION OF THE PREFERRED 
EMBODIMENTS 

40 In a standard voice recognizer, either in recognition or in training, most 

of the computational complexity is concentrated in the word decoder 
subsystem of the voice recognizer. In tiie implementation of voice recognizers 
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with distributed system architecture, it is often desirable to place the word 
decoding task at the subsystem which can absorb the computatioiud load 
appropriately. Whereas the acoustic processor should reside as dose to the 
speech source as possible to redua the effects of quantization errors introduced 

5 by signal processing and/ ox channel induced errors. 

An exemplary implementation of tiie present invention is illustrated in 
Fig. 2. In the exemplary embodiment, the environment is a wireless 
commimication system comprising a portable cellular telephone or personal 
communications device 40 and a central commimications center referred to as a 

10 cell base station 42. In the exemplary embodiment the distributed VR system is 
presented. In the distributed VR the acoustic processor or feature extraction 
element 22 resides in personal communication device 40 and word decoder 48 
resides in the central commimications center. If, instead of distributed VR, VR 
is implemented solely in portable cellular phone it would be highly infeasible 

15 even for meditun size vocabulary, connected word recognition due to high 
computation cost On the other hand, if VR resides merely at the base statioiv 
tiie accuracy can be decreased dramatically due to the degradation of speech 
signals associated with speech codec and channel effects. Evidently, there are 
three advantages to the proposed distributed system desigrt The first is tiie 

20 reduction in cost of the cellular telephone due to the word decoder hardware 
that is no longer resident in the telephone 40. The second is a reduction of the 
drain on the battery (not shown) of portable telephone 40 that would result 
from locally performing the computationally intensive word decoder operation. 
The third is the expected improvement in recognition accuracy in addition to 

25 the flexibility and extendibility of the distributed system. 

The speech is provided to microphcme 20 which converts the speech 
signal into electrical signals which are provided to feature extraction element 
22. The signals from microphone 20 may be analog or digital If the signals are 
analog, then an aiulog to digital converter (not shown) may be needed to be 

30 interposed between microphorie 20 and feature extraction element 22. The 
speech signals are provided to featiare extraction element 22. Feature extraction 
element 22 extracts relevant characteristics of the input speech that will be used 
to decode the lingviistic interpretation of the input speech. One example of 
characteristics that can be used to estimate speech is the frequency 

35 characteristics an input speech frame. This is frequently provided as linear 
predictive coding parameters of the input frame of speech. The extracted 
features of the speech are then provided to transmitter 24 which codes, 
modulates and amplifies the extracted feature signal and provides the features 
ttirough duplexer 26 to antenna 28, where the speech features are transmitted to 
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cellular base statior\ or central communications center 42. Various types of 
digital coding/ modulation, and transmission schemes well known in the art 
may be employed. 

At central communications center 42, the transmitted features are 
5 received at antenna 44 and provided to receiver 46. Receiver 46 m^ay perform 
the functions of demodulating and decoding the received transmitted features 
which it provides to word decoder 48. Word decoder 48 determines, from the 
speech features, a linguistic estimate of the speech and provides an action 
signal to transmitter 50. Transmitter 50 performs the functions of amplificaticm, 
10 modulation and coding of tt\e action signal, and provides the amplified signal 
to antenna 52, which transmits the estimated words or a command signal to 
portable phone 40. Transmitter 50 may also employ well known digital coding, 
modulation or transmission techniques. 

At portable phone 40, tiie estimated words or command signals are 
15 received at antenna 28, which provides the received signal ttirou^ duplexer 26 
to receiver 30 which demodulates, decodes the signal and then provides ttie 
command signal or estimated words to control element 38. In response to the 
received command signal or estimated words, control element 38 provides the 
intended response (e.g., dialing a phone number, providing information to 
20 display saeen on the portable phone, etc.). 

The same system represented in Figure 2 could also serve in a slightly 
different way in that the information sent back from central commxmications 
center 42 need not be an interpretation of the transmitted speech, rather the 
information sent back from central commimications center 42 may be a 
25 response to the decoded message sent by the portable phone. For example, one 
may inquire of messages on a remote answering machine (not shown) coupled 
via a communications network to central communications center 42, in which 
case tiie signal transmitted from central communications center 42 to portable 
telephone 40 may be the messages from the answering machine in this 
30 implementation. A second control element 49 would be collocated in the 
central communications center. 

The significance of placing the feature extraction element 22 in portable 
phone 40 instead of at central communications center 42 is as follows. If the 
acoustic processor is placed at central communications center 42, as opposed to 
35 distributed VR, a low bandwidth digital radio channel may require a vocoder 
(at the first subsystem) which limits resolution of features vectors due to 
quantization distortion. However, by putting the acoustic processor in the 
portable or cellular phone, one can dedicate entire channel bandwidth to 
feature transmissioa Usually, the extracted acoustic feature vector requires less 
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bandwid^ than the speech signal for transmission. Since recognition accuracy 
is highly dependent on degradation of input speech signal, one should provide 
feature extraction element 22 as dose to user as possible so that feature 
extraction element 22 extracts feature vectors based on miaophone speech, 
5 instead of (vocoded) telephone speech which may be additionally corrupted in 
transmission 

In real applications, voice recognizers are designed to operate under 
ambient ojnditions, such as backgroxmd noise. Thxis, it is important to consider 
the problem of voice recognition in the presence of noise. It has been shown 

10 that, if the training of vocabulary (reference patterns) is performed in the exact 
(or approximate) same environment as the test condition, voice recognizers not 
only can provide good performance even in very noisy environments, but can 
reduce the degradation in recognition accuracy due to noise significantly. The 
mismatch between training and test conditions accounts for one of the major 

15 degradation factors in recognition performance. Witii the assiunption that 
acoustic features can traverse commimication channels more reliably than 
speech signals (since acoustic features require less bandwiddi than speech 
signals for transmission as mentioned previously), the proposed distributed 
voice recognition system is advantageous in providing matching conditions. If 

20 a voice recognizer is implemented remotely, tiie matching conditions can be 
badly broken due mainly to channel variations such as fading encoimtered in 
wireless communications. Implementing VR locally may avoid these effects if 
the huge training computations can be absorbed locally. Unfortunately, in 
many applications, this is not possible. Obviously, distributed voice 

25 recognition implementation can avoid mismatch conditions induced by channel 
perplexity and compensate for the shortcomings of centralized 
implementations. 

Referring to Figure 3, the digital speech samples are provided to feature 
extraction element 51 which provides the features over communication channel 

30 56 to word estimation element 62 where an estimated word string is 
determined. The speech signals are provided to acoustic processor 52 which 
determines potential features for each speech frame. Since word decoder 
requires acoustic feature sequence as input for both recognition and trairung 
tasks, these acoustic features need to be transmitted across tfie communication 

35 charmel 56. However, not all potential features used in typical voice 
recognition systems are suitable for transmission through noisy channels. In 
some cases, transformation element 22 is required to facilitate source encoding 
and to reduce the effects of channel noise. One example of LPC based acoustic 
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features which are widely used in voice recognizers is cepstrum coefficients, 
{q}. They can be obtained directly from LPC coefficients {ail as follows: 

cm=am+Z^(m)^kam-k/ m=l,...^ (5) 



Cm=Z^^)^am-lo m=P+U„,Q (6) 

where P is the order of LPC filter used and Q is the size of cepstrum feature 
vector. Since cepstrum feature vectors change rapidly, it is not easy to 

10 compress a sequence of frames of cepstrum coefficients. However, there exists 
a transformation between LPCs and line spectnmi pair (LSP) frequencies which 
changes slowly and can be efficiently encoded by a delta pulse coded 
modulation (DPCM) scheme. Since cepstrtmi coefficients can be derived 
directly from LPC coefficients, LPCs are transformed into LSPs by transform 

15 element 54 which are then encoded to traverse the commimication channel 56. 
At remote word estimation element 62, tiie transformed potential features are 
inverse transformed by inverse transform element 60 to provide acoustic 
features to word decoder 64 which in response provides an estimated word 
. string. 

20 An exemplary embodiment of the transform element 54 is illustrated in 

Figure 4 as transform subsystem 70. In figure 4, the LPC coefficients from 
acoustic processor 52 are provided to LPC to LSP transformation element 72. 
Within LPC to LSP element 72 the LSP coefficients can be determined as 
follows. For Pth order IPC coefficients, the corresponding LSP frequeruies are 

25 obtained as the P roots which reside between 0 and of the following 
equations: 

P(w) = cos5w + p^cos4w + ... + p^/2 (7) 
30 Q(w) = cos 5w + q^ cos4w + + (8) 

where pi and qi can be computed recursively as follows: 



35 



P =-a.-a„.-p.,,l^i^/2 (10) 
M i P-i 

a =-a.+a„.-q.,,l^/2 (11) 
^ i P-* 1-1 
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10 

The LSP frequencies are provided to DPCM element 74 where they are encoded 
for transmission over commimication channel 76. 

At inverse transformation element 78, the received signal &om chaimd is 
passed through inverse DPCM element 80 and LPC to LSP element 82 to 
recover the LSP frequencies of the speech signal. The inverse process of LPC to 
LSP element 72 is performed by LSP to IPC element 82 which converts the LSP 
frequencies back into LPC coefficients for use in deriving the cepstnmi 
coefficients. LSP to LPC element 82 performs ttie conversion as follows: 

^ -1 . „-2. 



10 P(z) = (l+z-l)n(l-2cos(w2j_-^)z ^ + z ^) (12) 

P /2 

Q(z) = (l-z-l)n(l-2cos(w2i)z-^ + z''2) (13) 



A, ^ 1 V -i P(z)-K3(z) 

A(z) = 1- J^air^^ 2 

i=l 



The LPC coefficients are then provided to LPC to cepstrum element 84 which 
provides the cepstrum coefficients to word decoder 64 in accordaru:e with 
equations 5 and 6. 

Since the word decoder relies solely on an acoustic feature sequence, 

20 which can be prone to noise if transmitted directly through the communication 
channel, a potential acoustic feature sequence is derived and transformed in tiie 
subsystem 51 as depicted in Figure 3 into an alternative representation that 
facilitates transmissioa The acoustic feature sequence for use in word decoder 
is obtained afterwards througjh inverse transformation. Hence, in distributed 

25 implementation of VR, the feature sequence sent throu^ the air (channel) can 
be different from the one really used in word decoder. It is anticipated that the 
output from transform element 70 can be further encoded by any error 
protection schemes that are known in the art. 

In Figure 5, an improved embodiment of the present invention is 

30 illustrated. In wireless commuiucation applications, users may desire not to 
occupy the communication chaiuiel for a small number of simple, but 
commonly used voiced commands, in part due to expensive channel access. 
This can be achieved by further distributing the word decoding function 
between handset 100 and base station 110 in the sense ttiat a voice recognition 

35 with a relatively small vocabulary size is implemented locally at handset while 
a second voice recognition system wifli larger vocabulary size resides remotely 
at base station. They both share the same acoustic processor at handset. The 
vocabulary table in local word decoder contains most widely used words or 
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word strings. The vocabulary table in remote word decoder, on the other hand, 
contains regular words or word strings. Based on this infrastructure, as 
illustrated in Figure 5, the average time that chaimel is busy may be lessened 
and the average recognition accuracy increased. 

5 Moreover, tiiere will exist two groups of voiced commands available, 

one, called special voiced command, corresponds to the commands 
recognizable by local VR and the other, called regular voiced command, 
corresponds to those not recognized by the local VR. Whenever a special 
voiced command is issued, the real acoustic features are extracted for local 

10 word decoder and voice recognition function is performed locally without 
accessing communication channel. When a regular voiced command is issued, 
transformed acoustic feature vectors are transmitted through channel and word 
decoding is done remotely at base station 

SiTKe the acoustic features need not be transformed, nor be coded, for 

15 any special voiced conunand and vocabulary size is small for local VR, ttie 
. required computation will be much less than tiiat of the remote one (the 
computation associated with the search for correct word string over possible 
vocabularies is proportional to vocabulary size). Additionally, tiie local voice 
recognizer may be modeled witii a simplified version of HMM (such as with a 

20 lower number of states, a lower nimiber of mixture components for state output 
probabilities, etc) compared to remote VR, since the acoiistic feature will be fed 
directly to local VR without potential corruption in the channel. This will 
enable a local, though limited vocabulary, implementation of VR at handset 
(subsystem 1) where computational load is limited/ It is envisioned that the 

25 distributed VR structure can also be used in other target applications different 
than wireless communication systems. 

Referring to Figure 5, speech signals are provide to acoustic processor 
102, which then extracts features, for example LPC based feature parameters, 
from the speech sigiuil. These featiues are then provided to local word 

30 decoder 106 which searches to recognize the input speech signal from its small 
vocabulary. If it fails to decode the input word string and determines that 
remote VR should decode it, it signals transform element 104 which prepares 
the features for transmission. The transformed features are then transmitted 
over coirununication channel 108 to remote word decoder 110. The 

35 transformed features are received at inverse transform element 112, which 
performs the inverse operation of transform element 104 and provides the 
acoustic features to remote word decoder element 114 which in response 
provides tfie estimate remote word string. 
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The previous description of the preferred embodiments is provided to 
enable any person skilled in the art to make or use ttie present invention. The 
various modifications to tiiese embodiments will be readily apparent to tiiose 
skilled in ttie art, and the generic principles defined herein may be applied to 
5 other embodiments without the use of the inventive faculty. Thus, tiie present 
invention is not intended to be limited to the embodiments shown herein but is 
to be accorded the widest scope consistent with the principles and novel 
features disclosed herein. 
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CLAIMS 

1. A voice recogniticm system comprising 

2 feature extraction means located at a remote station for receiving a £rame 

of speech samples and extracting a set of speech features from said frame of 

4 speech samples in accordance with a predetermined feature extraction format 
and for providing said set of speech features: and 

6 word decoder located at a central processing station for receiving said 

set of speech features and for determining a syntax in accordaiKe with a 

8 predetermined decoding format. 

2. The system of Claim 1 wherein said set of features are linear 
2 predictive coding parameters. 

3. The system of Qaim 1 further comprising a local word detector 
2 collocated in said remote station for determining a syntax in accordance witii a 

predetermined small vocabulary decoding format 
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